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Abstract 



In a recent paper Birke and Bissantz ( 2008 ) considered the problem of nonparametric estima- 
tion in inverse regression models with convolution-type operators. For multivariate predictors 
nonparametric methods suffer from the curse of dimensionality and we consider inverse re- 
gression models with the additional qualitative assumption of additivity. In these models 
several additive estimators are studied. In particular, we investigate estimators under the 
random design assumption which are applicable when observations are not available on a 
grid. Finally, we compare this estimator with the marginal integration and the non-additive 
estimator by means of a simulation study. It is demonstrated that the new method yields a 
substantial improvement of the currently available procedures. 

Keywords: Inverse regression, Additive models, Convolution-type operators 
Mathematical subject codes: primary, 62G08; secondary, 62G15, 62G20 
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1 Introduction 



Inverse models have numerous applications in such important fields as biology, astronomy, economy 



or physics, where they have been intensively studied in a deterministic framework Engl et al 



(1996 


)■ 


Saitoh 


( 


1997) 



Recently inverse problems have also found considerable interest in the 
statistical literature. These investigations reflect the demand in applications to quantify the 
uncertainty of estimates or to validate the model assumptions by the construction of statistical 



confidence regions or hypotheses tests, respectively [see 


Mair and Ruymgaart 


(1996 


), 


Kaipio and 


Somersalo 


(2010 


), 


Bissantz et al. 


(2007b), Cavalier 


(2008 


), 


Bertero et al. 


(2009 


), 


Bertero et al. 



(2009) or Birke et al. (2010) among others]. In this paper we are interested in the convolution 
type inverse regression model 



;i.i) y 



g{z)+e 



ip(z - t)0(t)d(t) +e 



with a known function ip : Mr — > 
of the experiment is to recover the signal 9 : 



[e.g. Adorf (1995)] and a centered noise term e. The goal 
from data (zi, Yi), . . . , (z n , Y n ) which is 



closely related to deconvolution [e.g. Stefanski and Carroll (1990) and Fan ( |1991 )]. Models of the 



type (1.1) have important applications in the recovery of images from astronomical telescopes or 



fluorescence microscopes in biology. Therefore statistical inference for the problem of estimating 



the signal 9 in model (1.1) has become an important field of research in recent years, where the 



main focus is on a one dimensional predictor. Bayesian methods have been investigated in Bertero| 



et al. (2009) and Kaipio and Somersalo (2010) and nonparametric methods have been proposed 



by Mair and Ruymgaart (1996), Cavalier (2008) and Bissantz et al. (2007b) among others. 



In the present paper we investigate convergence properties of Fourier-based estimators for the 
function 9 with the following purposes. Firstly, our research is motivated by the fact that decon- 
volution problems often arise with a multivariate predictor such as location and time. For this 



situation Birke and Bissantz (2008) proposed a nonparametric estimate of the signal 9 and derived 
its asymptotic properties under rather strong assumptions. We will discuss the nonparametric es- 
timation problem for the signal 9 under substantially weaker assumptions. Secondly, because 
nonparametric estimation usually suffers from the curse of dimensionality improved estimators 
incorporating qualitative assumptions such as additivity or multiplicity are investigated under the 
fixed and the random design assumption. While additive estimation has been intensively discussed 



for direct problems from different perspectives [see L 


jnton and Nielsen (1995b), 


Mammen et al. 


(1999), Carroll et al. (2002), Hengartner and Sperlich 


2005), 


Nielsen and Sperlich 


(2005) 


, Doksum 


and Koo (2000), Horowitz and Lee (2005), Lee et al. 


(2010) 


, Dette and Scheder 


(2011) 


- to our 



best knowledge - only one additive estimator is available for indirect inverse regression models so 
far where it is assumed that the observations are available on a grid [see Birke et al. (2012)]. In 
this paper we are particularly interested in two alternative additive estimators. The first one is 
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applicable if observations are available on a grid but has a substantially simpler structure than 
the method proposed by the last-named authors, which makes it very attractive for practition- 



ers. Moreover, it also yields substantially more precise estimates than the method of Birke et al. 
(2012). The second estimator is additionally applicable in the case of random predictors. 



Thirdly, we will also investigate the case of correlated errors in the inverse regression model (1.1), 
which has - to our best knowledge - not been considered so far although it appears frequently in 
applications. Finally, we do not assume that the kernel ip is periodic, which is a common assertion 
in inverse regression models with convolution operator [see e.g. Cavalier and Tsybakov (2002) . 
Note that for many problems such as the reconstruction of astronomical and biological images 
from telescopic and microscopic imaging devices this assumption is unrealistic. 
The remaining part of this paper is organized as follows. In Section [2] we introduce the necessary 
notation, different types of designs and estimators studied in this paper. Section [3] is devoted to the 
asymptotic properties of the estimators and we establish asymptotic normality of all considered 
(appropriately standardized) statistics. In Section [4] we explain how the results are changing for 
dependent data while Section [5] presents a small simulation study of the finite sample properties 
of the proposed methods. In particular we compare the new additive estimator with the currently 
available methods and demonstrate its superiority by a factor 6-8 with respect to mean squared 
error. Finally all details regarding the proofs of our asymptotic results can be found in Section [6j 



2 Preliminaries 



Recall the definition of model (1.1) where we assume that the moments E[e k ] exist for all k G N 
such that E[e] =0 and a 2 = E[e 2 ] > 0. For the sake of transparency we assume at this point 
that the errors corresponding to different predictors are independent - for the more general case 
of an error process with an MA(g)-structure, see Section EJ We will investigate various estimators 
under two assumptions regarding the explanatory variables z. 

(FD) Under the fixed design assumption we assume that observations are available on a grid of 
increasing size. More precisely we consider a sequence a n — > as n — > oo and assume that 
at each location Zk = G IR d with k = (kx, A^) G {— n, n} d a pair of observations 
(zk, Ik) is available in the model 

(2.1) Y k = g(z k ) + e k = [ - t)0(t)dt + e k , 

where {^k | k G {— n, ...,n} d } are independent and identically distributed random variables. 
Under this assumption the sample size is iV = (2n + l) d . Note that formally the random 
variables {lk|k G {— n, n} d } form a triangular array, but we do not reflect this dependence 
in the notation. In other words we will use the notation lk,Zk,£k instead of Ik.m z k,n> £ k,n 
throughout this paper. 
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(RD) Under the random design assumption we assume that the explanatory variables are realiza- 
tions of independent, identically distributed random variables Xi >n , ...,X n n with a density 
f n . Again we will not reflect the triangular structure in the notation and use Y k , X^, e k and 
/ instead of Y k n , X fc n , e k n and f n , respectively, that is 



(2.2) 



Y k = gr(X fc ) + e k 



^(X k -t)9(t)dt + e k ; ke{l,...,n}, 



where Si, ...,£ n are independent identically distributed random variables. Under this as- 
sumption the sample size is N = n. 



We will use different estimators in both scenarios (2.1) and (2.2). Note that assumption (FD) 
assumes that observations are available on a complete <i-dimensional grid of length — . In this 



case an estimator of the signal 6 has also been studied by Birke and Bissantz (20081). The estimator 



in model (2.2) under assumption (RD), which is proposed in the following section, could also be 



used if not all observations are available on the grid. 



2.1 Unrestricted estimation for random design 

Fourier-based estimators have been considered by numerous authors in the univariate case (e.g. 
Diggle and Hall| ( [1993 ), Mair and Ruymgaart (1996), Cavalier and Tsybakov (2002) and Bissantz 



et al. (2007a)) and its generalization to the multivariate case considered in the models (2.1) and 



(2.2) is straightforward. For model (2.1) a Fourier-based estimator is given by 



(2.3) 
where 



(27T) 



v , , aw, 



n k&{-n,...,n} d 



denotes the empirical Fourier transform, (v, w) is the standard inner product of the vectors v, w e 
]R d and and denote the Fourier transform of a kernel function K and the convolution 



function vp (which is assumed to be known), respectively. Moreover, in (2.3) the quantity h is a 



bandwidth converging to with increasing sample size. Birke et al. (2012) used this estimator to 



construct improved estimators under the qualitative assumption of additivity in the case of a fixed 



design. In Section [2T2] we will propose an alternative additive estimator in the case of fixed design, 
which provides a notable improvement of the estimator proposed by the last named authors. 



For a random design we will use the same Fourier-based estimator as defined in (2.3), where the 
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empirical Fourier transform 3> FD (w) in (2.3) is replaced by 
(2.4) $ RD {w) = i^ e *<w,x fc > 



max{/(X fe ),/(i)}' 



l/a n = (l/a„, l/a n ) e M d and a n is again a sequence converging to with increasing sample 
size. The resulting estimator will be denoted by 9 RD (x*). In (2.4) / denotes the density of Xi 
and we take the maximum of /(X^) and /(^-) to ensure that the variance of 9 RD (x.*) is bounded. 
We also note that the estimator 9 RD admits the representation 

n 

(2.5) 6 RD (x*) = ]T>Wx*,X fc ), 



k=l 



where the weights are given by 

(2.6) ^(x*,X„) = — * lu ,„ , d [ e-^-^^Wrfw. 

nmax{/(X fc ), f(±)}(2n) d J Rd $^(w) 

Remark 2.1 Note that we use the same bandwidth for all components of the predictor. This 
assumption is made for the sake of a transparent presentation of the results. In applications the 
components of the vector x represent different physical quantities such that different bandwidths 
have to be used. All results presented in this paper can be modified to this case with an additional 
amount of notation. 



2.2 Estimation of additive inverse regression models 



It is well known that in practical applications nonparametric methods as introduced in Section [27L 
suffer from the curse of dimensionality and therefore do not yield precise estimates of the signal 9 
with a multivariate predictor. A common approach in nonparametric statistics to deal with this 
problem is to postulate an additive structure of the signal 9, that is 

m 

(2.7) 0(x*) = 9 add {x*) := 9 a Q dd + ^ 9^f (x*.) 



sec 



Hastie and Tibishirani (2008)]. Here {/i, ...,I m } denotes a partition of the set {1, ...,d} with 
cardinalities \Ij\ = dj and x|. is the vector which includes all components of the vector x* with 
corresponding indices i e Ij. Furthermore 9q M is a constant and 9 add : lR dj — > K denote functions 



normalized such that 



J ^ d (x)rf(x) = (j = 1, 
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Note that the completely additive case is obtained for the choice m = d, that is d± = ... = dd = 1. 
In the case of direct regression models several estimation techniques such as marginal integration 
[see Linton and Nielsen (1995b), Carroll et al. (2002), Hengartner and Sperlich (2005)], backfitting 



Mammen et al. ( 1999 ), Nielsen and Sperlich (2005)] have been proposed in the literature. Recently 



the estimation problem of an additive (direct) regression model has also found considerable interest 
in the context of quantile regression [see |Doksum and Koo (2000), De Gooijer and Zerom (2003), 



Horowitz and Lee (2005), Lee et al. (2010), Dette and Scheder (2011) among others] but - to 
our best knowledge - only one estimator has been proposed for additive inverse regression models 
under the assumption that observations are available on a grid [see Birke et al. (2012)]. For 
this situation we will propose an alternative estimator in the following section, which yields an 
improvement by a factor 6-10 with respect to mean squared error (see our numerical results in 
Section [51). 



To construct an estimator in the additive inverse regression model (2.7) with random design we 



apply the marginal integration method introduced in Linton and Nielsen (1995a|) to the statistic 
defined in (2.5). To this end we consider weighting functions Q^, Qi m , ■ 
define 



,Qi 



— y 



and 



{2.1 



Q(x*) = Q 7l (x* 1 )...Q /m (x* m ) 
Q 7 c(x} c ) = Q/ a (xJJ...Q /i _ l (xJ )Q /J+I (xj )...Qi m (*i 



where IJ = {1, . . . , d} \ Ij. With this notation we introduce the quantities 



(2.9) 
(2.10) 



aj,Qjc 



0(x*)dQ 
#(x*)dQ(x* 



J = l,- 



Now let 9 RD denote the unrestricted estimator introduced in Section 2.1 for the random design 
model, then the additive estimator for the signal 9 is finally defined by 



(2.11) 
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add,RD i 



Otl,Qj C 



+ OLr, 



( X U - {m-i)c 



where c and o.j,Q lC denote estimates for the quantities c and <x, 5 q jC which are obtained by replacing 



in (2.9) and (2.10) the signal 9 by its estimator Qf ull ^ RD ; respectively. Recalling the definition of 



the unrestricted estimator in (2.3) and (2.4|), we obtain from (|2.9|) the representation 
(2-12) «i,Q I? K) 



n 



k=l 



n< d (x*.,x 
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where the weights are given by 



and 



nh d (2n) d J Rd 



3 <(w,X Jk >/A e -i(wi i ^J j >/fc^ c 



-dw x 



max{/(X fe ),/(i)} 



e J 2 dQi<r\x.jc 



2.3 An alternative additive estimator for a fixed design 

In principle the marginal integration estimator could also be used under the fixed design assump- 



tion (FD) and its asymptotic properties have been studied by Birke et al. (2012). However, it 



turns out that for observations on a grid a simpler and more efficient estimator can be defined. 
This idea is closely related to the backfitting approach. To be precise we note that the assumption 
of additivity for the signal 9 implies additivity of the observable signal g due to the linearity of 



the convolution operator. Hence, model (2.1) is equivalent to 



(2.13) Ik = go + g h (z kli ) + ... + g Im (z kl J + e k , 
where g = / R d H z ~ t)6 dt, 

(2.14) gi .(z k ) = f ^.(z k - tj.)ef (t x .)dt 7 . (j = 1, . . . ,m) 

3 jR d J 3 

and ...,ipi d are the marginals of ip, that is 



ip(t)dt 



is- 



Recall the definition of and kjc as the dj and (d — ^-dimensional vector corresponding to the 
components (k[ \ I G Ij) and (k[ \ I E If) of the vector k = (k\, A^), respectively. In order to 
define estimators of these terms we consider the empirical Fourier transforms in dimension dj 



*/,(w) 



(na n ) d i 



E 



i(w,z k > . 

^k 7 e h (j 



,m), 



]H,e{-n,... y n} a 3 

where the random variables Z kl are given by 

1 



(2.15) 



(2n + l) c 



E 



kjc£{— n,...,n} d J 
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The additive estimator is now defined by 

(2.16) eaM,F D{?e) = §q + § FD { ^j + + ^FD (xJ j 



where 



»0 

ke{-n,...,n} d 



1 /"_■/* \ ^or (w) 

(2.17) 0™( x y = — r / e^'V^w)^^ (j = 1, . . . ,m) 

(2lTp Vi ( w ) 



Note that by the lattice structure the statistic in (2.15) is a \/n d ^-consistent estimator of 



9 1 ( z kr .)• Therefore the deconvolution problem for the j-th component is reduced to a problem in 

^ 3 

dimension dj and the estimator 8f D (x* I .) can be rewritten as 



(2-18) OffMJ = Z^^A^l 



k Jje {-n,...,n} d J 



where the weights are defined by 

(2-19) u> k . n (Xr ) = — —r e 3 ^ t^t« w - 

2.4 Technical Assumptions 

In the following Section we will derive important asymptotic properties of the proposed estimators. 
For this purpose the following assumptions are required, where different statements in the following 
discussion require different parts of these assumptions. Throughout this paper || . || denotes the 
Euclidean norm and the symbol a n ~ b n means that lim^oo a n /b n = c for some positive constant 
c. 

Assumption 1 

(A) Under the random design assumption the Fourier transform $^ of the function ip satisfies 
(as h ->■ 0) 



for some (3 > and constants Ci, C 2 > 0. 
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(B) Under the fixed design and additivity assumption the Fourier transforms of the marginals 
ipi, of ip satisfy 



*' f<W)|2 ,iw~C 2 ft-^ 



for some j3j > (j = 1, . . . , m) and constants Ci, Ci > 0. 



Assumption 2 



(A) Under the random design assumption the Fourier transform $^ of the kernel K in (2.3) is 
symmetric, supported on the cube [—1, l] d and there exists a constant b G (0, 1] such that 
$ K (w) = 1 for w G [-b,b] d ,b> 0, and |$x(w)| < 1 for all w G [-1, l] d . 

(B) Under the fixed design and additivity assumption the Fourier transform of the kernel 
K is symmetric and supported on [—1, and there exists a constant b G (0, 1] such that 
$k(w) = 1 for w G [-b, b] d i,b > 0, and |$^(w)| < 1 for all w G [-1, for all j = 1, ...,m. 

Assumption 3 



(A) The Fourier transform of the signal 6 in model (1.1) exists and satisfies 



/ |$e(w)| || w || s 1 dw < 00 for some s > 1. 



[~B) The function g in model (1.1) satisfies 

|g(z)| || z || r dz < 00 



for some r > such that a r n = 0(h l3+d+s 1 ). 
(C) The Fourier transforms $ add, $ e add of the functions 8j dd ,...,8^ in the additive model 



(]2.7|) satisfy 

/ |$ £) add(w)| || w || s ~ dw < 00 for some s > 1 and j = l,...,m. 



(D) The functions g^, ■■■,gi m defined in (2.14) satisfy 

\gj. (z)| || z || r dz > 00 for j = l,...,m 

for some r > such that a« d] = 0(h^ j 
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Assumption 4 For each n G N let Xi,...,X n denote independent identically distributed d- 
dimensional random variables with density / (which may depend on n) such that /(x) 7^ 
for all x G [ — — , ^-} d - Furthermore we assume, that for sufficiently large n G N 

/(x)>/(— ) for xG[--,-f. 

The final assumption is required for the marginal integration estimator and is an extension of 
Assumption [TJ For a precise statement we define for y G IR d-ci ? 

(2.20) Wy) = I e-^dQAxv) 

J find — dA J J 



where Qj?(x/9) as defined in (2.8). 



Assumption 5 There exist positive constants 7i,...,7 m such that the Fourier transform of 
the convolution function ip satisfies 
2 



(A) L 



L jc 



m) 



(B) k^e-'^^ where 

(C) / B -(nr=ii^(?)r)i^p^ = o('» 



7n 



min™ x 7i 



■)■ 



Remark 2.2 



1. The common assumption on the convolution function ip is 
(2.21) ^(w) || w f -> C 



see Birke and Bissantz (2008)]. Assumption 1 is substantially weaker because we do not 



assume $^ to be asymptotically radial-symmetric. It is satisfied for many commonly used 
convolution functions such as the multivariate Laplace density, the density of several Gamma 



distributions such as the Exponential distribution for which (2.21) does not hold. 



2. Assumptions [3|A) and [3|B) will not be required for the new additive estimator introduced 
in Section 2.2 under the fixed design assumption. As a consequence the asymptotic theory 
for the new estimator in the completely additive case m — d (d\ = ... = d m = 1) does 



not require the additive functions to have compact support as it is assumed in |Birke et ah 
(|20T2|. 
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3. Assumptions [3^B) and [3^D) are needed for the computation of the bias, where we have to 
ensure that g(x) converges sufficiently fast to zero as x — > oo. Note that we only observe 
data on the cube 



a n ' a„ J 



4. The results of this Section can be extended to multiplicative signals of the form 
(2.22) 

The details are omitted for the sake of brevity. 



Example 2.3 In order to demonstrate that the assumptions are satisfied in many cases of prac- 
tical importance we consider exemplarily Assumptions [T] and [5] and a two dimensional additive 
signal that is x = (xi,x 2 ), 

9(xi,x 2 ) = 9i(xi) + 9 2 (x 2 ), 



(Ii = I 2 = {l})-^2 = I\ — {2})- For the convolution function in (1.1) and the weight (2.8) we 
choose 



V(x) 

Q(x) 

respectively, and the kernel K is given by 

JT(x) 



A 2 



-\(\X1\ + \X2\) 



sm(^xij sm{x 2 ) 

TT 2 XiX 2 



The integrals in Assumptions [T] and [5] are therefore obtained by a straightforward calculation 



/ 



|$k(w)| 



R 2 



dw = 



[-i.iF 



/i 2 



3h 2 



|$A-(w)| 2 
l<Mx)| 2 



[-l,i] 2 



h 2 



h 2 



(Jw 



5/i 4 3/i 2 



'a 2 



2 |$g(w)| ; 
l<Mf)l 2 



4/i 2 |sin(^) | 2 (1 



h 1 



-dm 



tl'7 



I5h e 



o{h~ 
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where we define Si(x) = J* ^^-dt. 



2 
3 = 1 



i{w!. ,xj )//> 



Li 



3_ 

h 



|$if(w)| 
l<Mf)l 



-dw = h? 



-1,112 



-iaiHi/h S ' D ( h ) , e -iiC2X 2 /h S ' D ( h . 



W-2 
„2N 2 



2 |^(w)| 2 

l<Mir)l 2 



dw = 



,2\ 2 



1,11 



4tf| sin (^) | 2 (1 + ^ 2 16 

^ H =9^ 



o(/T 



3 Asymptotic properties 
3.1 Unrestricted estimator 

In the following we discuss the weak convergence of the unrestricted estimator 9 RD for the signal 
9. In the case of a fixed design on a grid (assumption (FD)) the asymptotic properties of this 
estimator have been studied in Birke and Bissantz| (2008). Therefore we restrict ourselves to model 



(2.2) corresponding to the random design assumption, for which the situation is substantially more 



complicated. Here the estimator is given by 



(3.1) 



iRDi 



nh d {27r) 



£ 

k=l 



i(w,x*-X fc )/fe ^^( w y 



dw 



<Mf) max{/(X fc ),/(i)} 



and its asymptotic properties are described in our first main result which is proved in the appendix. 
Throughout this paper the symbol =>• denotes weak convergence. 



Theorem 3.1 Consider the inverse regression model ( |2.2[ ) under the random design assumption 
(RD). Let Assumptions^A),^^A),^B),Qand^\ be fulfilled and h -> and a n ->■ as n — > oo 
such that 

n^V-^/te 1 ) 172 ~> oo and n l ' 2 h Zd ' 2 /(a; 1 ) 3 / 2 -> oo. 



Furthermore, assume that the errors in model (2.2) are independent, identically distributed with 
mean zero and variance a 2 . Then 



(3.2) 



Vf l/2 (9 RD (x*) - E[9 RD (x*)]) jV(0, 1) 
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where E[9 (af)] = 9{x*) + 0(h s 1 ) and the normalizing sequence 
bounded by 

(3.4) qn 1 / 2 /^ 2 ^/^ 1 ) 1 / 2 < vr 1/2 < ay /2 /* d/2+/3 . 



Remark 3.2 Note that the rate of convergence in Theorem 3^ depends sensitively on the design 
density. We demonstrate this by providing two examples, one for the fastest and one for the 
slowest possible rate. First, assume that the predictors are uniformly distributed on the cube 
f — — , —] d and that the convolution function is the ci-dimensional Laplace density function. This 
yields /3 — 2d in Assumption 1 and we get a rate of convergence of order n 1 / 2 ^/ 2 ^ 2 , which 



is exactly the lower bound in Theorem 3.1 and coincides with the rate in the fixed design case. 



However, a rate of order n 1//2 /i M//2 is obtained for the design density 

d 

f(x u ...,x d ) = Y[g a ,b(x k ), 

k=l 

where the function g a ^ : R — > R is defined by 



9a,b{ X ) 



if x e [-1,1] 



— plsp 



and the parameters a and b are given by b > 1, a = (2 + ^) 1 . In this case we have 

vr l/2 ~ n- 1 / 2 /^/ 2 + n- 1 / 2 /*- 2 ^-^ 1 )/ 2 . 

For the choice /i = o{a h ~ 1 ) we therefore obtain V l ~ n~ 1 / 2 h~ 5d / 2 . 
3.2 Additive estimation for random design 



In this Section we consider the marginal integration estimator Q add > RD defined in (2.11) under the 



random design assumption. Lemma 3J3 below gives the asymptotic behaviour of the j-th compo- 
nent c\j t Q IC and Theorem 3.5 the asymptotic distribution of Q add > RD , The proofs are complicated 

j | , _. 

and also deferred to Section [oj. 

Lemma 3.3 If Assumptions [ljA), |2j |3](C), [3^D), [1] and [5] are satisfied and 

rl l/2^+<i/2- 7j -/2 /(a -l ) l/2 ^ ^ md n l/2^3/2( d - 7 ,) / ( a -l) ^ oo 
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asn-> oo. Then the appropriately standardized estimator at j,Q lC {x*j.) defined in (2.12) converges 

j 3 

weakly to a standard normal distribution, that is 

(3-5) V 2 - 1/2 (a j>Ql ,(x* ) - E[a jtQ „te )])) => Af(0, 1) 

3 J J 

for j = 1, ...,m, where E[&jq ^{x}.)] = a,g c (je|.) + 0(/i s_1 ) and i/je standardizing factor 

j 3 ' i J 

2 



satisfies 



C,n 1/2 h d/2+e ~"" /(a' 1 ) 1 ' 2 < Vf 1 ' 2 < Cy'V 2 *^'. 

Remark 3.4 Similar to the unrestricted case, the rate of convergence depends on the design 
density /. Note that under the given assumptions the rate of convergence of the estimator 6tj t Q c 
is by the factor h? 3 faster than the rate of the unrestricted estimator. 

Theorem 3.5 If Assumptions [l^A), |5J |3](C), [3]^D), [4] and [5] are satisfied and 

n ^ + (3 d+7mi „/)2 /(a -l ) 2 ^ ri l/2^+( d - 7mi „)/2 /(a -l ) l/2 ^ ^ 

„V(^)/(O^M (J = l,...,m) 

as n — )■ oo ; t/ien t/ie appropriately standardized additive estimator Q add > RD converges weakly to a 
standard normal distribution, that is 

(3.6) V 3 1/2 (6 add ' RD (x*) - E[9 add ' RD {x*)\) A/"(0, 1), 

w/iere £ , [^ ad(i ' ii£, (af )] = 6 add (x*) + 0(h s ~ l ) and the standardizing factor 



V* 



satisfies 



1 



2d 



3=1 



2 (Jl 



a* + g( S y)f(s) 



max{/(«),/(£)} 



3.3 Additive estimator for fixed design 



The asymptotic properties of the additive estimator Q add ' RD defined in (2.11) under the fixed 



design assumption have been studied by Birke et al. (2012) and in this Section we investigate the 
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asymptotic properties of the alternative estimator defined in Section 2.2 Our first result, Lemma 



3.6 



gives the weak convergence of 6f D , whereas Theorem 3.7 contains the asymptotic distribution 



of the estimator Q add ' FD defined in (2.16). The proofs are again deferred to Section 6 



Lemma 3.6 Consider the inverse regression model under the fixed design assumption (FD). Let 
Assumptions [j](B), [2j and [3|TJ) be fulfilled for some j G {1, ...,m}, h — » and a n — > as 
n — > oo such that 



n d h d 3 +2P ]a d 3 _^ ^ md n 2 h 2+di+P ja 3 ^ 



then 



(3.7) ^Kl^^m-Sf^)])^^,!), 
where the normalizing sequence is defined by 



U n j( x i- 



(2n + l) d ~ d i 



* \2 



fe 7j 6{-n,...,n} d J 



the weights are defined in (2.19) and 

E[§f D (x*j.)] = 9f d (x*j.) + 0(h s - v ) + 0(n- 2 h- d >-^- 2 a- 3 ). 



The result of Theorem 3/7 below follows immediately from Lemma 3^ The bias is of the same 
order as the bias in Lemma 3.6 and we define j* = argmax^- (dj + 2/3j). 



Theorem 3.7 Consider the inverse regression model under the fixed design assumption (FD). Let 
Assumptions TJ2, 3^C) and |3](D) be fulfilled, h — > and a n — > as n — > oo such that 

n d h d i* +2l3: >* a^* -> oo and n 2 h 2+di * + ^ i * al -»■ oo. 



Then 



(3.1 



U n (x*)- 1 / 2 {6 add ' FD {x*) - E[6 add ' FD (x*)}) AT{0, 1), 



where the normalizing sequence is defined by 



UJx*) 



° 2 £ (E j^Trp ^,"H)) 

fee{-n,...,n} d \i=l V ' / 
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the weights Wk I >n are defined in (2.19) and 

E[8 add > FD (x*)] = 6 add (x*) + 0(h 3 - 1 ) + o( 

Remark 3.8 



1 



[1) The normalizing sequence U n (x*) in (3.8) is of order n d h d ^* +213 ^* a n J * ■ 



(2) The bias of the additive estimator in the fixed design case is only vanishing if the subsets Ij 
in the decomposition (2.7) satisfy dj < 3 for all j = 1, ...,m. 



(3) Theorem 3.2 can easily be extended to multiplicative models of the form (1.1) with 



e ^*) = IKK) 

3=1 

if the convolution function if) is also multiplicative. Otherwise the estimator is not consistent 
and other techniques such as the marginal integration method have to be used. 



4 Dependent data 

In this Section we briefly discuss the case of dependent data. To be precise we assume that 
the errors in the inverse regression models have an MA(q) structure. Under the random design 
assumption this structure is given by 



(4.1) 



E t = Z t + (3 1 Z t _ l + ... + P q Z t _ 



where {Z t , }tez denotes a white noise process with variance a 2 . A careful inspection of the proof 
of Theorem |3.1 which is based on the investigation of the asymptotic properties of cumulants 



shows that the result of Theorem 3.1 remains valid under this assumption. 



Theorem 4.1 



(1) Consider the inverse regression model (2.2) under the random design assumption (RD). If 



the Assumptions of Theorem \3.1\ are satisfied, then 
(4.2) 



V 1 ~ 1/2 (9 RD (x*)-E[9 RD (x*)}) =>JV(0,1) 



where the normalizing sequence is given by 



1 



nh d (27r) 2d J Rd 



a -i {s ^/ h -v)) **j>) ds ] 2 ^ £Uo fob + 9 2 (hy))f(hy) ^ 



m & x{f(hy)J(±)y 
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Po = 1 and E[9 RD (x*)} = 9(x*) + O^ 1 ). 



(2) If the assumptions of Theorem 3.5 are satisfied, then the appropriately standardized additive 



estimator Q add > RD converges weakly to a standard normal distribution, that is 



(4.3) 



V 3 ' 1/2 (9 add ' RD (x*) - E[B add ' RD {x*)]) Af{0, 1) 



where the standardizing factor is given by 
1 



2d 



r^)(J2e' i{wi ^L I c{w I c 
i=i 



$ K (hw) , \ 2 (a 3 Et,i=o PkPi + 9(s) 2 )f(s) 

dw - — ; — ; — — -r— as. 



max{/( S ),/(-M} 



and E[6 add ' RD (x*)} = 6 add (x*) + 0(h 



Under the assumption of a fixed design on a grid we consider an error process with an MA(g) 
structure defined by 



(4.4) 



re{-q,...,q} d 



where {Zj}j eZ d are i.i.d. random variables with mean zero and variance a 2 . This means, that 
the noise terms are influenced by all shocks, which have a distance on the lattice lower or equal 
q regarding the oo-norm. The following result can be obtained by similar arguments as used for 



the proof of Theorem 3.7 



Theorem 4.2 Consider the inverse regression model (2.1) under the fixed design assumption with 



an MA ( q) dependent error process. If the assumptions of Lemma 3. 6 are satisfied we have 

V~)l 2 (x*) (§ add ' FD (x*) -E[6 add ' FD (x*)]^j =>AT(0,1) 



where the normalizing sequence is given by 



V MA (x*) = a 2 Yl ^ftfn \J2 (2n + l) d -^ Whl ^ X * 1 ^ 2 ' 

l&L d rie|-q,...,<7i d ke{-n,...,n} d j=l 



ne{-g,...,<j} 

IUI|oo<2g 

and E[6 add ' FD (x*)] = 6 add (x*) + 0{h s ~ l ) 



)• 



Remark 4.3 If e t has an MA(oo) representation Theorem 4.1 and 4.2 will not hold in general, 
because without additional assumptions the l-th cumulant of the normalized statistic does not 
converge to zero for all / > 3. 
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5 Finite sample properties 



In this Section we investigate the finite sample properties of the new estimators and also provide 
a comparison with competing methods. We first investigate the case of a fixed design in model 



(1.1) with the convolution function 



^{x x ,x 2 ) = || e - 3 (l*il+NI), 



and two additive signals 



(5.1) 9 {1) (x 1 ,x 2 ) = e-^' - 1 ) + e"^- - 4 ) 

(5.2) {2 \x 1 ,x 2 ) = e" 1 " 1 " - 41 + 2e- 2x * 

For the kernel K in the Fourier transform we use the kernel K(~x) = sm ( x ^) sm ( X2 ) We consider 
a fixed design on the grid {(^-, ^f- | k±,k 2 G {— n, ...,n}} with N = {2n + l) 2 points where 
n G {30,50}. In both cases we choose the design parameter as a n = 0.25, such that the cube 
f— , — l 2 covers most of the region where the functions 9^> and 9^ 2 > deviate significantly from 0. 
In all simulations we use (independent) noise terms, which are normal distributed with mean 
and variance 0.25. 



The bandwidth h in the estimator (2.17) is chosen such that the mean integrated squared error 
(MISE) 



E 



(0(x) - #(x)) 2 dx 



is minimized. Figure [T] shows a typical example of the MISE as a function of the bandwidth h. 
Figure [2] shows the contour plot of the function 9^ defined in (5.1) and contour plots of three 



typical additive estimates where n = 50 and the bandwidths are chosen as h = 0.32, 0.36, 0.4 (the 
bandwidth h = 0.36 minimizes the MISE). We observe that the shapes in all figures are very 
similar. The bandwidths h = 0.32 and h = 0.4 yield stronger deviations from the true function 
especially at the boundary, but the main structure is even for these choices still recovered. Because 
other simulations showed a similar picture we conclude that small changes in the bandwidth do 
not effect the general structure of the estimator significantly. 



In order to investigate the finite sample properties of the new estimate 6 add > FD defined in ( [2~T6| 
we performed 1000 iterations with the signal 9^ (the results for the signal 9^' are similar and 
are not depicted for the sake of brevity). The simulated mean, variance and mean squared error 
(MSE) of § add ' FD are given in Table [l] for different choices of x = (xi,x 2 ) where the sample size is 
iV = 10201 and the variance of the errors is 0.25. We observe that in most cases the mean squared 
error is dominated by the bias. 
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Figure 1: MISE of the estimator 9 



add,FD 



for different bandwidths in model (5.1), where o = 0.5 



N 


XI 


X2 


6l( 2 ) (x) 


E[0(x)] 


Var(0(x)) 


MSE(0(x) 






-1.6 


0.1473 


0.2522 


0.0017 


0.0127 






-0.8 


0.3131 


0.3805 


0.0017 


0.0063 


10201 


-1.6 





0.6823 


0.8296 


0.0017 


0.0234 






0.8 


0.6823 


0.8159 


0.0017 


0.0195 






1.6 


0.3131 


0.3827 


0.0017 


0.0065 






-1.6 


0.6914 


0.8216 


0.0017 


0.0187 






-0.8 


0.8573 


0.9446 


0.0018 


0.0094 


10201 


-0.8 





1.2264 


1.3977 


0.0017 


0.0310 






0.8 


1.2264 


1.3864 


0.0017 


0.0273 






1.6 


0.8573 


0.9496 


0.0018 


0.0103 






-1.6 


2.1353 


2.1887 


0.0018 


0.0046 






-0.8 


2.3012 


2.3123 


0.0017 


0.0018 


10201 








2.6703 


2.7640 


0.0018 


0.0106 






0.8 


2.6703 


2.7548 


0.0016 


0.0087 






1.6 


2.3012 


2.3178 


0.0018 


0.0020 






-1.6 


0.6914 


0.8181 


0.0017 


0.0178 






-0.8 


0.8573 


0.9445 


0.0018 


0.0094 


10201 


0.8 





1.2264 


1.3967 


0.0017 


0.0307 






0.8 


1.2264 


1.3864 


0.0017 


0.0273 






1.6 


0.8573 


0.9496 


0.0018 


0.0103 






-1.6 


0.1473 


0.2532 


0.0016 


0.0128 






-0.8 


0.3131 


0.3785 


0.0017 


0.0060 


10201 


1.6 





0.6823 


0.8290 


0.0018 


0.0233 






0.8 


0.6823 


0.8168 


0.0019 


0.0200 






1.6 


0.3131 


0.3855 


0.0017 


0.0069 



Table 1: Mean, variance and mean squared error of the new additive estimator 9 = § add ' FD in the 
case of a fixed design. The model is given by (5.2) with variance a 2 = 0.25. 



In the second part of this section we compare three different estimates for the signal in the inverse 
regression model (1.1). The first estimate for 9 is the statistic add ^ FD proposed in this paper 



[see formula (2.16)]. The second method is the marginal integration estimator suggested by Birke 



et al. (2012) and the third method is the non additive estimate of Birke and Bissantz (2008). The 
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Figure 2: 

0add,FD d e ji ne d { n (2.16) with different bandwidths. Upper right panel: h 
h 



Contour plot of the function 0$) defined in (5.1) (left upper panel) and its estimates 

0.32; Lower left panel: 

0.4; 



0.36 (which minimizes the MISE); Lower right panel: h 



results are shown in Table [2] for the sample size N = 3721 and selected values of the predictor. 



We observe that the additive estimate of Birke et al. (2012) improves the unrestricted estimate 



with respect to mean squared error by 20-50%. However, the new additive estimate Q add > FD yields 
a much larger improvement. The MSE is about 14 and 7-10 times smaller than the MSE obtained 



by the unrestricted estimator or the estimator proposed by Birke et al. (2012). Further simulations 



for the signal 9^ in (5.2) show similar results and not depicted for the sake of brevity. 
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N 




X2 


0d)(x) 


E0(x) 


Var 0(x) 


MSE 0(x) 




3721 








1.8422 


1.9667 


0.0516 


0.0671 


qRD 


3721 





1 


1.6877 


1.6983 


0.0458 


0.0459 




3721 


1 


1 


1.1425 


1.1909 


0.0329 


0.0352 




3721 


1 


1.8 


0.5857 


0.6624 


0.0189 


0.0248 




3721 








1.8422 


1.8680 


0.0440 


0.0301 


Qadd,RD 


3721 





1 


1.6877 


1.6405 


0.0195 


0.0217 




3721 


1 


1 


1.1425 


1.3371 


0.0232 


0.0610 




3721 


1 


1.8 


0.5857 


0.8184 


0.0199 


0.0740 




3721 








1.8422 


1.8123 


0.0426 


0.0435 


§FD 


3721 





1 


1.6877 


1.7305 


0.0425 


0.0443 




3721 


1 


1 


1.1425 


1.2143 


0.0418 


0.0470 




3721 


1 


1.8 


0.5857 


0.4774 


0.0416 


0.0533 




3721 








1.8422 


1.8234 


0.0027 


0.0031 


Qadd,FD 


3721 





1 


1.6877 


1.6589 


0.0024 


0.0032 




3721 


1 


1 


1.1425 


1.1097 


0.0025 


0.0036 




3721 


1 


1.8 


0.5857 


0.5494 


0.0023 


0.0036 




3721 








1.8422 


1.8874 


0.0194 


0.0214 


qBBH 


3721 





1 


1.6877 


1.7316 


0.0191 


0.0210 




3721 


1 


1 


1.1425 


1.1833 


0.0201 


0.0218 




3721 


1 


1.8 


0.5857 


0.4438 


0.0207 


0.0408 



/// 



Ta ble 2: Mean, variance a nd mean squared error of the unr estricted estimator 9 FD proposed 



Birke and Bissantz (2008), the estimator Q BBH proposed by 



Birke et al. 



(2012) and the new 



estimators 6 HD , Q add > HD and Q add ' FD proposed in this paper. The model is given by (5.1), where 



a 



0.25. 



For the sake of comparison, the first two rows of Table |2j contain results of the estimators 9 RD and 
Q add ' RD ) where the explanatory variables follow a uniform distribution on the same cube ^-] 2 
as used for the fixed design. We observe a similar behaviour of the unrestricted estimators under 
the fixed and random design assumption. This corresponds to the asymptotic theory, which shows 
that in the case of a uniform distribution the unrestricted estimators converge with the same rate 
of convergence (see Remark 3.2). On the other hand, the additive estimator Q add < RD produces a 
substantially larger mean squared error compared to the estimator Q add > FD ; which is of similar size 
as the mean squared error of the estimator proposed by Birke et al. (2012). 

Because the performance of the estimators depends on the correct specification of the convolution 
function if) we next investigate the performance of the estimators under misspecification of the 
function if). In Figure [3] we display the contour plots of the estimates § add > FD ^ where in every panel 
the convolution function is misspecificated as Laplace distribution Lap(a, (5) with parameters 
a = and = |. In the upper left and upper right panel the (3 parameter of the Laplace 
distribution Lap(a, (3) is misspecificated, whereas in the lower left panel the true convolution 
function is the density of a standard normal distribution and in the lower right panel it is a gamma 
distribution. We observe, that a miss-specification of the shape of the convolution function (as it 
occurs if a Laplace density is used instead of the density of a Gamma(3,2) distribution) yields to 
an estimator with a different structure as the true signal (see the lower right panel in Figure 3). 
All other panels show the same structure as the upper left panel Figure [2] which gives the contour 
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plot of the true signal This indicates that the structure of the signal can be reconstructed, 
as long as the chosen convolution kernel exhibits similar modal properties as the "true kernel" . 
However, we also observe from Figure [3] that the levels of the contour differ from those of the true 
signal. 





Figure 3: Contour plot of the estimate § add ' FD of 9^ with misspecificated convolution func- 
tion. Upper left panel: ip misspecificated as Lap(0,^), where the true convolution function is 
Lap(0,l); Upper right panel: ip misspecificated as Lap(0,^), where the true convolution function is 
Lap(0,^) ; Lower left panel: ip misspecificated as Lap(0,^), where the true convolution function is 
J\f(0, 1); Lower right panel: ip misspecificated as Lap(0,^), where the true convolution function is 
Gamma(3,2). The model is given by (5.1), where cr 2 = 0.25. 
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We conclude this section with a brief discussion of the performance of the unrestricted estimator 
qrd unc [ er the assumption (RD) of a non-uniform random design. In Table [i] we display the 
simulated mean, variance and mean squared error for various distributions of the predictor X, 
where the components are independent and identically distributed. In most cases we observe 
similar results for the bias, independently of the distribution of X and the choice of the sequence a n . 
On the other hand the mean squared error is dominated by the variance, which depends sensitively 
on the choice of the parameter a n . This observation corresponds with the representation of the 
asymptotic variance of 9 RD in formula (3.3) of Theorem 3.1 We also observe that the impact of 



the distribution of the explanatory variable on the variance of the estimate 9 RD is much smaller. 



X 


N 






X2 




E0(x) 


Var 0(x) 


MSE 6»(x) 




10201 


0.25 





o 


1.8422 


1.7421 


0.0297 


0.0397 


[ a n > On ' 


10201 


0.25 





1 


1.6877 


1.7163 


0.0272 


0.0283 




10201 


25 


\ 


I 


1.1425 


1 2858 


0.0194 


O^QQ 
u.uoyy 




10201 


0.25 


1 


1.8 


0.5857 


0.6105 


0.0117 


0.0123 




10201 


0.5 








1.8422 


1.4957 


0.0076 


0.1277 




10201 


0.5 





1 


1.6877 


1.8123 


0.0070 


0.0225 




10201 


0.5 


1 


1 


1.1425 


1.5438 


0.0044 


0.1654 




10201 


0.5 


1 


1.8 


0.5857 


0.5695 


0.0023 


0.0026 




10201 


0.25 








1.8422 


1.8512 


0.3271 


0.3271 


N(0,1) 


10201 


0.25 





1 


1.6877 


1.7019 


0.7098 


0.7100 




10201 


0.25 


1 


1 


1.1425 


1.2038 


0.7077 


0.7115 




10201 


0.25 


1 


1.8 


0.5857 


0.5983 


0.4477 


0.4479 




10201 


0.5 








1.8422 


1.8229 


0.0079 


0.0083 


2V(0,1) 


10201 


0.5 





1 


1.6877 


1.7466 


0.0107 


0.0143 




10201 


0.5 


1 


1 


1.1425 


1.2531 


0.0114 


0.0236 




10201 


0.5 


1 


1.8 


0.5857 


0.6366 


0.0135 


0.0161 




10201 


0.25 








1.8422 


1.8758 


0.0174 


0.0185 


t(2) 


10201 


0.25 





1 


1.6877 


1.7129 


0.0255 


0.0261 




10201 


0.25 


1 


1 


1.1425 


1.1786 


0.0271 


0.0284 




10201 


0.25 


1 


1.8 


0.5857 


0.6138 


0.0324 


0.0332 




10201 


0.5 








1.8422 


1.8590 


0.0115 


0.0118 


t(2) 


10201 


0.5 





1 


1.6877 


1.7260 


0.0158 


0.0173 




10201 


0.5 


1 


1 


1.1425 


1.2069 


0.0182 


0.0223 




10201 


0.5 


1 


1.8 


0.5857 


0.6275 


0.0174 


0.0191 



Table 3: Mean, variance and mean squared error of the unrestricted estimator 6 RD proposed in 
this paper for different distributions of the explanatory variables X and different choices for the 
parameter a n . The model is given by (5.1) and the variance is a 2 = 0.25. 
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6 Appendix 

For the proofs we make frequent use of the cumulant method, which is a common tool in time 



series analysis. Following Brillinger (2001) the r-th order joint cumulant cum(Y]_, Y r ) of a 



r-dimensional complex valued random vector (Yi, Y r ) is given by 

(5.1) cum{Y u ...,Y r ) = ^(-lr^-l)!^!!^)---^]!^)' 



where we assume the existence of moments of order r, i.e. E(\YJ\) < oo (j = l,...,r) and the 
summation extends over all partitions (ui, v p ),p = 1, ...,r of (1, ...,r). If we choose Yj =Y,j = 
1, ...,r we denote with cum r (Y) = cum(Y, ...,Y) the r-th order cumulant of a univariate random 
variable. The following properties of the cumulant will be used frequently in our proofs [see e.g. 



Brillinger (2001) 



(Bl) cum(a{Yi, a r Y r ) — a± . . . a r cum(Yi, Y r ) for constants a±, ...,a r e C 

(B2) if any group of the Y's is independent of the remaining Y's, then cum{Yii Y r ) = 

(B3) for the random variable (Zi, Y l , ...,Y r ) we have 

cum(Z 1 + Yi,Y 2 ,...,Y r ) = cum(Z 1 ,Y 2 , ...,Y r ) + cum(Y 1 ,Y 2 , ...,Y r ) 

(B4) if the random variables (¥]_, Y r ) and (Z 1; Z r ) are independent, then 

cum{Yi + Zi, Y r + Z r ) = cum(Yi, Y r ) + cum(Zi, Z r ) 

(B5) cum{Y j ) = E(Yj) for j = 1, ...,r 
(B6) cum(Yj,Yj) = Var(Yj) for j = 1, ...,r 



We finally state a result which can easily be proven by using the definition (5.1 ) and the properties 
of the mean. 

Theorem 6.1 Let Y = (Y 1 ,...,Y r ) be a random variable, b n a sequence and C > a constant 
with 



3=1 



< C% for alll<l<r, 



then \cum(Y il , Y im )\ < (m — l)!C m 6™ S m ,j, where S m j denotes the Sterling number of the 
second kind. 
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We will also make use of the fact that the normal distribution with mean /i and variance o 2 is 
characterized by its cumulants, where the first two cumulants are equal to \x and a 2 respectively 
and all cumulants of larger order are zero. To show asymptotic normality in our proofs we have 
to calculate the first two cumulants which give the asymptotic mean and variance and show in 
a second step that all cumulants of order I > 3 are vanishing asymptotically In the following 
discussion all constants which do not depend on the sample size (but may differ in different steps 
of the proofs) will be denoted by C. 

For the sake of brevity we write 9 instead of §f ull < RD throughout this 



Proof of Theorem 



3.1 



proof. By the discussion of the previous paragraph we have to calculate the mean and the variance 
of #(x*) and all cumulants of order / > 3. We start with the mean conditional on X = (Xi, X n ), 
which can be calculated as 



k=l 



where the weights w n are defined in (2.6). By iterative expectation we get 

/(x) 



E[6(*)] 



2(x) 



-i(s,(x*— x))//i 



Ms 



^(f)max{/(x) ) /(i)} 



dsdjc. 



h d (2n) d 

which yields a bias of the form bias§ = E[9(x*)) — 0(x*) = A 1 + A 2 , where (note that $ 9 = $^ " $o) 
A 1 = 7 ^/ C ^>^(B)*.(i)A-^) 



A, 



h d (2Tr) c 
1 

h d {2n) c 



-i(s,x*)//i 



Mg) 



9 



(x)e*< s ' x W 



/(x) 



Vmax{/(x),/(i)} 



1 ) dxcfe 



For the summand A\ we can use exactly the same calculation as in Birke and Bissantz (2008) to 
obtain A\ = 0(h s ~ l ). For the second term A 2 we have 



A, < 



< 



h d (2n) d 
C 

h<l+P(2Tl) d J ([ 



IMg) 
I'M!) 



an' a n ' ' 



l<?(x) 



9W 



/(x) 



max{/(x),/(i)} 
/(x) 



max{/(x),/(i)} 



dxds 
dx, 



where we used Assumption [T](A) and 4 in the last inequality. In the next step we will use the fact 



that < 



/(*) 



j — — j— tt < 1 ( x ^ an d Assumption 3[B) to obtain 



c 



^+0(271-) 



1 



([-— .— } d ) 

VL an ' an ' ' 



9W 



-dx. = O 



h d +? 



0{h 



s-l^ 
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This shows that the bias of #(x*) is of order 0(h s 1 ). By the definition of #(x*) and (2.6) it follows 

1 



V(9(3*)\X) = 

which yields 

E[V(0(x*)\X)} 



a 



n 2 h 2d (2ir] 



2,1 



£ 

k=l 



-i( B ,(x*-1t k ))/h*K 



$K-(sK 2 

as 



Mt 



max{/(X fc ),/(^-)} 2 



a 



nh d (2n) 2d V 



-i< B ,(x*//«-y)> 



as 



mi: 



/(*y) 



rrr^y- 



max{/(ay),/(£)}2 



The variance of the conditional expectation is given by (observe again the definition of the weight 
w n in Q) 



n 

V(E[9(x*)\X}) = v{^2g(X k )w n (x* t X k fj 



k=l 
1 



nh d (27r) 2d 
1 



-»(s,(x*/fe-y)) $ir( s ) 



ds 



2 g 2 (h y )f(h y ) 



n(27r; 



2</ 



-i(s,(x»/fe-y)) ^-g( S ) 



max{/(ay),/(^)}2 



<Mf) max {/(fey), /(i)} 



where the second summand is of order 0(n x ). Thus the variance can be written as 



(5.2) V{6(y*)) = E[V(9(x*)\X)} + V(E[6(x*)\X}) 



nh d (2ir] 



2d 



-i(s,(x*/h-y)) 



Ms) 



rfs 



max{/(ay),/(^-)}2 



^rfy + O^- 1 ) 



and the rate of convergence has a lower bound given by 

V{9{^))- 1 ' 2 = 0(n 1 / 2 ^ +d / 2 /(a- 1 ) 1 / 2 ), 

where the symbol b n = fi(c n ) means that there exists a constant C and n G N such that for all 
n > no we have \b n \ > C\c n \. The variance has a lower bound 



1 



> 



nh d (2Ti) 2d 
C 



\2d 



(\^- - J —] d ) 

K *-ha n ' ha n ' ' 



-i{s,(x*/h-y)) _ 



-»(s,(x*/M> * K 



MV> 



mi: 



ds 



ds 



2(a 2 + g 2 ( h y))f( h y) 

f{hy) 2 



dy 



'dy = C(nh d+2fi )- 1 (l + o(l)), 



nh d (27r)— /a— 1 1 -'i 

v ' ^ha„ ' ha„ 1 > 

where we used Assumption [4] and Parsevals equality. This yields to the upper bound 
(5.3) V{6{x*))- 1 ' 2 = O (n^h^ 2 ) 
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For the proof of asymptotic normality we now show that the l-th cumulant of Gi = I cumi (V(9(x*)) 1//2 #(x*)) | 
is vanishing asymptotically, whenever I > 3. For this purpose we recall the definition of the weights 



w n in (2.6) and obtain from (5.3) the estimate 

n 

(5.4) d < Cn l ' 2 h ip+dl ' 2 \cum(Y kl w n (x*,X kl ),...,Y kl w n (x*,X kl ))\ 

fcl,...,fc; = l 
n 

= Cn l l 2 h l ^ dl ' 2 Y, \ cum i (IWx*, X*)) | 
fe=i 

= C n l l 2+l h l P +dl ' 2 l(cnm(^ 1 U 7 n (x*,X 1 ),...,^t i ;„,(x*,X 1 ))|, 

J6{0,1} ! 

where we used (B2) and the notation U° = g(X.i) and U 1 = e. This term can be written as 
Cn l/2+1 h ll3+dl/2 Y ( Z ) |«im(?7 il w ri (x*,X 1 ),...,i7-''tw n (x*,Xi))|. 

jl+...+jl=S 



By using the product theorem for cumulants [see e.g. Brillinger (2001J)] , we obtain 
(5.5) 



s=o ^ ' j 6 {o,ii ! v k=l 



j£{o,i}' 



where the third sum is calculated over all indecomposable partitions v — (z/i, u p ) of the table 

An A i2 

An A l2 

A; a 



Aij 

(here the first s rows have two and the last / — s rows have one column) and 

An = e 1 < i < s 

A i2 = w n (x*,Xi)) 1 < i < s 

A i:j = 0(XiK(x*, Xi)) s + 1 < i < I. 

As e is independent of X only those indecomposable partitions yield a non zero cumulant, which 
seperate all e's from the other terms. This means that for a partition v there are miv) sets 
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ui, v m {y) which include only e's while i/ m („)+i, ...,is p contain only u> n (x*, X)'s and g(K)w n (x* , X)'s. 



Thus (5.5) can be written as 



l m{v) P 

(5.6) Cn l / 2+1 h l ^ 2 J2( E Ell 

cum Sk (e) Yi cum(A ij ,ij G v k 

s=0 ^ ' iF-fO.lV v k=l fe=m(f)+l 



j£{0,l} 

ji+...+ii=s 



with 

A^- = w„(x* - Xi)) 1 < i < s 

Mj = #(X)u> n (x* - Xi)) s + 1 < i < I. 

and si + ... + s m i v \ = s . Furthermore we have Sj > 2, because the noise terms e have mean 
zero, and each set v m (i,)+i, v v includes at least one Aij with 1 < i < s because otherwise 
the partition would not be indecomposable. Let a r = \v r \ denote the number of elements in 
the set v r [r = m[y) + 1, ...,p), then we get a m +i + ... + a p = I. Furthermore for r G {m + 
1, ...,p} the cumulant cum(Aij,ij G u r ) equals 

(5.7) cMm(0(Xi)w n (x*, Xi)), ff(Xi)w n (x*, Xi)), w n (x*, X a )), w n {x*, Xi))) 



because of the symmetry of the arguments in the cumulant. In the next step we denote by b r the 
number of components of the form p(Xi)w n (x*, Xi) and show the estimate 



(5.* 



E[f[\g(X 1 )w n ^*,X 1 ))\ J] M^Xx))! 



< 



i=l 



(which does not depend on 6 r ). From Theorem 5.1 we then obtain that the term in (5.7) is of 
order 0(n _ar /?,~ ar ^ + ^/(l/a n ) _ar ). Equations (5.4), (5.6) and (5.7) yield for the cumulants of 
order / > 3 



G t < Cn l / 2+1 h ll3+dl / 2 J2 



m(u) 



e | e n cum s k { £ ) n n o r / l o 1 .i./+.()/'(_L)' (1 



j6{0,l}' v k=l 

jl+-+jl=s 



O^ri^hWffa- 1 ) 1 )- 1 ) = o(l), 



which shows the asymptotic normality. 

In order to prove the remaining estimate (5.8) we use the definition of io n (x*,Xi) and obtain for 
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the term on the left hand side of (5.8) 

1 



\br 



< 



< 



c 



c 



nh d (2n 

\g(x)\ br 



,-i<s,(x*-x))//i 



1 



^(f)max{/(x),/(i)} 
1 



ds 



/(x)dx 



|^(|)imax{/(x),/(i)} 
/v ds^) /(x)dx, 



rrr ds ) /( x ) rfx 



where we used the fact that g is bounded. Using this inequality and Assumption [l](A) it follows 
that L n < C/n ar h a ^ d+ ^f(^) ar , which proves gjj) . 

Proof of Lemma 3.3^ Similar to the proof of Theorem 3^ we have to calculate the cumulants 
of the estimators a 5) Q J9 (xj.). We start with the first order cumulant 



h d (2n) d V-d 



0* 



$^(s) e -i(B,(x«-x))/ ft/(x) 
M|)max {/ (x), /( ^) } ™^ X ^ 



and with the same arguments as in the proof of Theorem 3.1, we obtain a bias of order 0(h 
For the calculation of the variance of a J) Q /9 (x| .) we investigate its conditional variance. Recalling 
the definitions (2.6) and (2.20) it follows by a straightforward argument 



^(a,. Q/9 (x*)|X) 



n 2 h 2d (2ir) 



2d 



E 

fc=i 



-j{w,X fc >//i e i<w 7j pcj^.)//! r 



max{/(X fc ),/(^-)} 2 ' 



which gives 



/(*) 



-c?x. 



max{/(x),/(i)P 
The variance of the conditional expectation can be calculated as 

1 



y(£[d,, Q/c (x*)|x]) 



nh d (27r) 2d J ud 



x 



dx 



-dw 



max{/(/*), 
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n(2n) 2d 

g(hx)f(hx 



JR d JR d 3 \ h 



-dw 



x 



-<ix 



w /f \ & K (w 



h ) $ 



max{f(hx),f{±)} 
where the second summand is of order 0(n _1 ). Therefore it follows 

max{/(/iy),/(-^)} 2 



The upper bound for this term is obtained from Assumption [4] which gives 

{a 2 + g{hyf)f{hy) 



-dw 



(5.9) 



nu«{/(/iy),/(i)} 2 
Therefore an application of Parseval's equality and Assumption ^C) yields 

C 



(5.10) 



^• Q/C (x*)< 



A similar argument as in the proof of Theorem 3.1 gives the lower bound V(aj^q IC (x* I .) > 
C / nh d+2/3 ~" ,: > . Finally the statement that the /-th cumulant of V(&j t Q IC (x}.) _1 / 2 dj ) Q JC (xj.) is of 



order o(l) can be shown by similiar arguments as in the proof of Theorem 3.1 



Proof of Theorem 3.5^ The proof follows by similar arguments as given in the previous 
Sections. For the sake of brevity we restrict ourselves for the calculation of the first and second 
order cumulants. For this purpose we show, that the estimate c has a faster rate of convergence 
than dj j Q /C (x}.) for at least one j G {1, ...,m}. If this statement is correct the asymptotic variance 



of the statistic 



3=1 



is determined by its first term. Recalling the notation (2.12) this term has the representation 

m m n 

(5.ii) D n = £<w x y = EE y ^( x l» x *) 

j=l 3 j=l k=l 
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and can be treated in the same way as before. The resulting bias of D n is the sum of the biases 
of the individual term and therefore also of order 0(/i s_1 ). The conditional variance is given by 



n m 



v(D n \x) = * 2 J2\J2< dd ' RD (*l^ 

k=l 3=1 



a 



n 2 h 2d (2ix) 



2d 



k=l 



3=1 



A h J J 



dw 



max{/(X fe ),/(i)} 
This yields for expectation of the conditional variance 
E[V(D n \X)] 

(^\ ' w/ x; ; 7. /: (' 



n / l 2d( 27r )2d J Rd 



3 i(w,s)//i , 



3=1 



h JJ^(f 

and the variance of the conditional expectation is obtained as 
V(E[D n \X]) 



-dw 



max{/(s),/(i)P 



ds 



1 



nh d (27r) 2d 
1 



3 = 1 



-dw 



2 g(hs) 2 f(hs) 
max{f(hs)J(±)} 2 



ds 



n(2n) 



2d 



-dw 



g(hs)f(hs) 



max{/(/is),/(-M} 



c/s 



where the second summand is of order 0(n 1 ). Thus yields for the variance 



V(D r 



a 2 +^(s) 2 )/(s) 



w /, c \\ ^(w 



-dw 



x 



-j— Y<is + 0(n 



3- 



max{/(B),/(i)} 

In order to obtain bounds for the rate of the variance, we use the lower bound for max{/(/is), f (-£-)} 
mentioned in (5.9) and Parseval's equality which yields 

771 ^ 



1/2 



w\|2" 



dw) = O ({nh d+2/s -^ /(a,, 1 ))" 1 ) 
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as an upper bound, where the last estimate follows from Assumption [5j The lower bound is of 
order Vt{{nh d+2l3 ~ lmin )~ l ) , where we use Assumption [4] and the same calculations as in the previous 
Section. These are in fact the same bounds as for 6tj* y Q IC (xJ.J with j* = argminj^j. This means 
that 

D n - E[D n ) = P (n- 1 /2/ i - (i / 2 -^«»/ 2 /(a 11 - 1 )- 1 / 2 ) 

In the last step we show that the estimate c has a faster rate of convergence. For this purpose we 
write c as weighted sum of independent random variables that is 



#(x*)dQ(x* 



nh d (2ir 



1 n 



k=l 



3 i(w,X fc )/fc 



w, nA ^(w) rfw 



Y, 



h JJ^) max{/(X fc ),/(i)} 



It now follows by similar calculations as given in the previous paragraph and Assumption |5](C) 
that 

m 

V(c) = o(V(£a M 

and thus we can ignore the term c for the calculation of the asymptotic variance of the statistic 

Qadd,RD 



Proof of Lemma 3.6: Observing the representation (2.15) and (2.18) we decompose the 



estimator into its deterministic and stochastic part, that is 



(5.12) 
where 



*™(xy = E ln + E 



2n 



E 



1 



in 



R 



2/i 



(2n + 

1 

(2n + l) d - d ' 



(9h ( z k 7l ) + - + 9lm ( z k /m )K ,n (xj. ) 



ke{-n,...,n} [i 



5^ £k% Zj , n (xl i ) 



ke{-n,...,n} d 



0( 



1 



and Wk 7 .,n(x}.) are defined in (2.19). In a first step we show, that the bias of 9f D is of order 



w2fc 2+d J +/3 Ja 3 )■ For this purpose we rewrite the deterministic part as 

F - F {1) 4- F {2) 



where 



E 



(i) 

In 



0/ (z k jw k 7 „(X/ 



kjj. G{-n,...,n} 3 
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E 



(2) 
In 



(2n + 1)'' ''• 

kjc£{— n,...,n} 3 

j 



(W 2 ^) + ■■■ + gij-A^i^) + 9i j+ i(zk Ij+1 ) + ■■■ + gimi^ir, 



X 



W k J;; ,n(x^), 



k /,S{-n,...,n} 3 



where the second summand is of order = o( ^ j+ J d . ] 
3^D). For the difference of the first summand and 9if d (x*j.) we use the same calculation as in 
and Bissantz (2008) and obtain 



0{h s ), which follows from Assumption 



Birke 



- (x r .) = 0(h s -') + 



1 



n 



Note that the Rieman-approximation does not provide an error of order 0((na n )~ d ), but we can 
show that the lattice structure yields an error term of order 0((n 2 h 2 af l )~ l ). In the next step we 
derive the variance of the estimator 9f D . We can neglect the deterministic part E 2n in (5.12) and 
obtain from Parseval's equality and Assumption [ijB) 



a 



(2ra + l) d - d i 



K.«( x i 



*-)| 2 

3 ' *^ 



k/j 6{-n,...,n} J 



(7 



(2n + 1 ) d ~ d i n 2d i h 2d i a n dj (2vr) 2d > 

k/ . e{— n,...,n} J 



-i(w,(x*.-z fc/ ))/h $ K {w) 2 



(2n + l) d - d m d jh d mn 3 (2n) 2d J 

/ e 7 ^ - — T^rdw ds + 0{{na n ) , 



l/(ha n ),l/(ha n )] a i 

a 2 



i(w,(x}.// l -s)) *x(w) 



(2n + l) d - d in d ih d ian ] {27i) 2d i M 

^ / 

(2n + l) d - d in d i/i^an J '(27r) 2 ^ Jr^ l $ ^ 3 (f)l 2 



dw 



ds(l + o(l)) 



«w)| 2 



dw(l + o(l)) 



(T 2 C 



C 



(2n + 1 ) n d i + 2 ^ (hi (2?r) n d /i d i + 2 ^ a£ ' 

For the proof of the asymptotic normality, we finally show that the Z-th cumulant of V (6f D (x* T ^^^Of (>. 
converges to zero for I > 3, which completes the proof of Lemma 3J3 For this purpose we note 
that 
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< 



< 



C 



(n ld / 2 h ld i/ 2 a l n j/2 



e n 



ki,...,k(G{— n,...,n} 



d m=l 



i(w,(x| ))/ft$jc(w y , 

( 3 j — — — dw cwm(£ kl ,...,£ k| ) 



C 

r ,ld/21 1 ld i /2j d j/ 2 

n a a n kie{-n,..,n}<*m=i 



e n 



|$y(w) 

* l<Mf) 



where denotes the /-th cumulant of e. From Assumption 1 it follows that this term is bounded 
by 

° .. (2n + = Cn-WWh-^a"*' 2 , 

n ld / 2 h ld i/ 2 a l n j J2 

which converges to zero for I > 3. 



Proof of Theorem 3.7: In the following discussion we ignore the constant term g = 9q because 
the mean 

* = 5 E «■ 



ke{— n,...,n}° 



is a v n -consistent estimator for this constant and the nonparametric components in (2.13) can 
only be estimated at slower rates. Note that 



& 



add.FD/* 



1 

x ' = Y ^E (2n+1) ^ ^.-K 

ke{-n,...,n} d i=l v ' 



and obtain the asymptotic distribution with the same arguments as in the proof of Lemma 3.6 



Proof of Theorem 4.2 Under the assumption of an MA(q)-dependency structure (4.4) there 
are no changes in the calculation of the mean of the estimator 9f and we only have to calculate 
the cumulants of order I > 2 in order to establish the asymptotic normality. We start with the 
variance, which is given by 



(2n + l) 2 ^) 
1 



E 



w^j. ,n(x 7 . Jw k2i/ . , n {^i j )curn{e i!:i , £k 2 



ki,k 2 e{-n,...,n} a 



Y Y Y ^u^nix^Wky^fx^ 



(2n + l) 2 ( d - d i) ^ ^ 

kie{-n,...,n} d k 2 :||k 2 -ki|| 00 <2 (J n,r 2 £{-q,...,q} d 
CUm(/3 Tl Z kl - ri) /3r2^k2-r 2 ) 

E 



(2n + l) 2 ^-^) 



W k 1 , Jj ,n(xJ j >k 2i/j ,n(x^; 



k 1 e{-n,...,ri} d k 2 :||k 2 -k 1 || 00 <2g ne{-g,...,g} d 
CMm(/3 ri Zk 1 _ ri , /3k 2 -ki+n^ki-ri) 
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a 2 



J2 Yl J2 U; k 1 , J . > n(xJ.)w k2I . in (x*.) 



(2n + l) 2 ( d - d i) ^ ^ ^ 

v kie{-n,...,n} d k2:||k2-ki|| 00 <2grie{-(j,...,g} [i 

/3ri/3k 2 -ki+ri 
2 

(2n + l) 2 (^) SEE '^ ^Jx, ),^ . kt l Jx) ^ 

k 1 e{-n,...,n.} d l£Z d rie{-g,...,(j}< i 
||l||oo<2g 

a 2 (l + o(l)) 



2n + 1 Y d ~ d i) 

k 17 .6{-n,...,n} d J leZ d r 1 6{-,,..,,}' ! 
3 ||l||oo<2fl 

where we used a Taylor- approximation for the weights w^.+ki i.,n( x / 3 .) = ^kj. j.,n( x jj.)(l + °(1)) i n 
the last step. This gives the expression for the variance in Lemma [4.2[ For the calculation of the 
cumulants of y- 1 / 2 ^® ' FD we first note that the order of the variance V = V(6^ dd,FD ('Ki.)) can be 



calculated in the same way as in the proof of Lemma 3.6, which gives V — 0(n d h dj 2 ^a n 
Therefore we have to show 

\cu mi {n d l 2 h d ^ 2 ^a^Hf^ FD )\ = n ld / 2 h l ^/ 2+ Pa^/ 2 \cu mi (6*f> FD )\ 

for I > 3. By a straightforward calculation it follows that 

\cu mi {6l D ){*X)\ 

1 NT^ TT ( f -<(w 1 (xl -«» m; .))A^(w) 



e.T_„ »Wm=1 •'J* J 'TW\h' 



(2n + l)^)nW«W kli ... iki ^„ v .. in}d ^ii V M <M 



" (2n + l)^>n«^o£ klr .. jk! £ ) ... Md S |Mf)| rfw ) |mm(£k — £k < 

= ~ m"7T? ^ |cMm(e k ...,e k )| 

( S(2n + l) d , 



(2n + l) l ( d - d ^n ld ih l ^a l n j h ^ 



because by (4.4) kx can be chosen arbitrarily and k 2 , k/ have only (4g+l) possibilities to be cho- 
sen and their bound is independent of n. Thus the Z-th cumulant is of order n~ ld ^ 2+1 h~ ld ^ 2 an d , 
which converges to zero for I > 3. The result for Q add ^ FD follow immediately from the results of 
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^FD 
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